Please see below for a Venn schematic for illustrating Precision-Recall and:
This is a subset of observed fate percentages we have:
what we tune
We need to binarize the matrix for giving each lineage, one or more class labels as ground truths when constructing a confusion-matrix and deriving evaluation metrics such as:
Specificity, Sensitivity >> ROC;
Precision, Recall >> PR-cruve; F1 score etc.
We denote \(\tau\) as the lowest percentage of observed cell label when we assign a lineage such label.
For example, in case \(\tau\) = 0.2 & 0,4,
The 1st highlighted lineage will be labeled as only neu anyway
The 2nd will be labeled as both neu; mo and only neu respectively
Ery
Meg
Mast
Baso
Eos
Neu
Mo
Ccr7_DC
pDC
Lymphoid
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.1
0.8
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.7
0.3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
Precision-Recall in our context
Even in the context of multi-class precision-recall, the calculation of precision-recall will be performed first within each class, in our case, each fate.
That being said, once we set the threshold \(\tau\), the number of recall denominator is fixed.
Then when we decrease the threshold \(p\) for labeling a lineage as a specific fate or not based on simulated percentage, precision denominator increase, the numerator (white intersection), thus recall should either stay unchanged or increase. But as a ratio, precision can change in a non-monotonic way.
What micro-averaging does is offerring a metric just like average pearson correlation we tried before, as shown below:
---title: Precision Recallformat: html: toc: true theme: cosmo echo: false eval: true code-fold: true code-tools: true code-line-numbers: true embed-resources: true self-contained-math: true---```{python, data-load}#| label: data-loadingimport osimport plotly.express as pximport plotly.graph_objects as goos.chdir("/home/ruitong/scDiffEq_PyTorch/scDiffEqDev/")from InHouse.larry_prepare import*from InHouse.fate_evaluation import*true2081 = loadPickle("InHouse/data/Reindex_2081Fate")fate11 = loadPickle("InHouse/data/Reindex_2081_Fate11")fate10 = loadPickle("InHouse/data/Reindex_2081Fate")colors = loadPickle("InHouse/data/scDiffEq_colorp")```### Multi-class Precision-recallPlease see below for a Venn schematic for illustrating Precision-Recall and:This is a subset of observed fate percentages we have:```{python, df-format}def highlight_vals(val, min=0, max=1, color='lightyellow'):ifmin< val <max:return'background-color: %s'% colorelse:return''# more to read https://towardsdatascience.com/a-quick-and-easy-guide-to-conditional-formatting-in-pandas-8783035071ee```::: {.callout-caution title="what we tune"}We need to binarize the matrix for giving each lineage, one or more class labels as ground truths when constructing a confusion-matrix and deriving evaluation metrics such as:1. Specificity, Sensitivity >> ROC; 2. Precision, Recall >> PR-cruve; F1 score etc.We denote $\tau$ as the lowest percentage of observed cell label when we assign a lineage such label.For example, in case $\tau$ = 0.2 & 0,4,- The 1st highlighted lineage will be labeled as only `neu` anyway- The 2nd will be labeled as both `neu; mo` and only `neu` respectively:::```{python}fateD = fate10pctD = fateD.div(fateD.sum(axis=1), axis=0)abbvDict = {'Neutrophil': 'Neu','Monocyte': 'Mo','Erythroid': 'Ery'}pctD.rename(columns=abbvDict, inplace=True)formatDict =dict((c,"{:,.1f}") for c inlist(pctD.columns))pctD.iloc[6:16,:].style.applymap(highlight_vals)\ .format(formatter=formatDict)\ .set_properties(**{'text-align': 'center'})\ .hide_index()``````{python}#| eval: falsefateD = fate10pctD = fateD.div(fateD.sum(axis=1), axis=0)abbvDict = {'Neutrophil': 'Neu','Monocyte': 'Mo','Erythroid': 'Ery'}pctD.rename(columns=abbvDict, inplace=True)pd.options.display.float_format ='{:,.1f}'.formatpctD.head(15).to_html(index=False,justify='left', col_space="20px")``````{python}from matplotlib_venn import venn2, venn2_circlesvennset = (600, 200, 500)v= venn2(subsets=vennset, set_labels=('B: Simulated Fate \n Precision Denominator','C: Observed Fate \n Recall Denominator'), set_colors=("orange", "lightblue"), alpha=0.7)venn2_circles(subsets=vennset, linestyle="dashed", linewidth=2)v.get_patch_by_id("11").set_color("white")v.get_label_by_id("11").set_text("A: Numerator")for text in v.set_labels: text.set_fontsize(10)plt.title("Per cell type/fate")plt.axis('on')plt.text(0.4, 0.5, 'Sample Space: 2081', horizontalalignment='center', verticalalignment='center')plt.show();```::: {.callout-note title="Precision-Recall in our context"}Even in the context of multi-class precision-recall, the calculation of precision-recall will be performed first within each class, in our case, each fate.That being said, once we set the threshold $\tau$, the number of recall denominator is fixed.Then when we decrease the threshold $p$ for labeling a lineage as a specific fate or not based on simulated percentage, precision denominator increase, the numerator (white intersection), thus recall should either stay **unchanged or increase**. But as a ratio, precision can change in a non-monotonic way.What `micro-averaging` does is offerring a metric just like average pearson correlation we tried before, as shown below:$$\begin{aligned}& \mathrm{Precision_{ Neu}} = \frac{A_{Neu}}{B_{Neu}}\\& \mathrm{Precision_{ Mo}} = \frac{A_{Mo}}{B_{Mo}}\\& \mathrm{Precision_{ Micro}} = \frac{A_{Neu} + A_{Mo}}{B_{Neu} + B_{Mo}}\end{aligned}$$:::```{python, text-format}#| eval: falsethe color is [red]{style="color: red;"}And [this line has a yellow background]{style="background-color: yellow"}.``````{python}def PR_multiclass(Y_test,y_score,pclasses,avgm ="micro",t=0.3): Y_test = np.where(Y_test> t,1,0) precision,recall,average_precision,au_PR = [dict() for _ inrange(4)] for i,k inenumerate(pclasses): precision[k], recall[k], _ = precision_recall_curve(Y_test[:, i], y_score[:, i]) average_precision[k] = average_precision_score(Y_test[:, i], y_score[:, i]) precision[avgm], recall[avgm], _ = precision_recall_curve(Y_test.ravel(), y_score.ravel()) average_precision[avgm] = average_precision_score(Y_test, y_score, average=avgm)for k in average_precision.keys(): au_PR[k] = auc(recall[k],precision[k])return [t,precision,recall,average_precision,au_PR]def PR_curve(PRmetric,suffix,s =3,avgm ="micro"): p,precision,recall,average_precision,auPR = PRmetric _, ax = plt.subplots(figsize=(s,s)) f_scores = np.linspace(0.2, 0.8, num=4) lines, labels = [], []for f_score in f_scores: x = np.linspace(0.01, 1) y = f_score * x / (2* x - f_score) (l,) = plt.plot(x[y >=0], y[y >=0],'r--', color="darkgray", alpha=0.5) plt.annotate("F1={0:0.1f}".format(f_score), xy=(0.7, y[45] +0.02)) colors = ['#0a9396','#bb3e03','#ee9b00','#023047'] pr_class = ['Neutrophil', 'Monocyte', 'Baso',avgm]for i, color inzip(pr_class, colors): display = PrecisionRecallDisplay( recall=recall[i], precision=precision[i], average_precision=average_precision[i]) display.plot(ax=ax, name=f"PR: {i}", color=color)# add the legend for the iso-f1 curves handles, labels = display.ax_.get_legend_handles_labels() labels.extend(["iso-F1 curves"])# set the legend and the axes ax.set_xlim([0.0, 1.0]) ax.set_ylim([0.0, 1.05]) ax.legend(handles=handles, labels=labels, loc=3,prop={'size': 8}) ax.set_title(f"Multi-class Precision Recall: \n{suffix}"+f" | t = {p}") plt.show()def simPR(sim,p,s=3,f="nmb",suffix="TEST",plot=False): saveOut ="./Analysis/task2_comparison/data" fatedict = loadPickle(f"{saveOut}/FateTasks2") obsPerc = loadPickle(f"{saveOut}/obs_perc2") simPerc = {f:cnt2perc(sim,f) for f in fatedict.keys()} Y_test = obsPerc[f] y_score = simPerc[f] pclasses = fatedict[f] testl = PR_multiclass(Y_test,y_score,pclasses,t=p)if plot: PR_curve(testl,suffix=suffix,s=s)return testl``````{python, sim-fetch}saveOut ="./Analysis/task2_comparison/data"sdqSim = loadPickle(f"{saveOut}/scdiffeq_simout")presSim = loadPickle(f"{saveOut}/prescient_simout")```Please see below for a set of precision-recall curves for two models when setting $\tau = 0.4$```{python, scdiffeq-pr}PRmetric = simPR(sdqSim[ ('scDiffEq2', 'mu2000-sigma800-ss0.8','seed3', 'KEGGyes')],0.4,s=5, f="all",suffix="scDiffEq (KEGG) ALL",plot=True)``````{python, PRESCIENT-pr}PRmetric2 = simPR(presSim[('prescient', '1e-05', 'seed1', 'KEGGyes')],0.4,s=5, f="all",suffix="PRESCIENT (KEGG) ALL",plot=True)``````{python}prange = np.linspace(0.2,0.5,4)pres_model = presSim[('prescient', '1e-05', 'seed1', 'KEGGyes')]scdiffeq_model = sdqSim[('scDiffEq2', 'mu2000-sigma800-ss0.8','seed3', 'KEGGyes')]avgp1 = [simPR(sim = pres_model,p=i,f="all")[3]["micro"] for i in prange]avgp2 = [simPR(scdiffeq_model,p=i,f="all")[3]["micro"] for i in prange]scannedOut = pd.DataFrame({"prescient":avgp1,"scdiffeq":avgp2,"threshold" : prange})```Here, we will see how generally increase tau will harm the average precision but we are always doing better. ```{python}fig = px.line(scannedOut.melt(id_vars=['threshold']), x="threshold", y="value", color ="variable",markers=True,template="presentation",width=500, height=500)fig.update_layout(xaxis_title="threshold tau",yaxis_title="Micro Average Precision ",font_family="Courier",font_size=16)fig.show()```